74 research outputs found
Basic Filters for Convolutional Neural Networks Applied to Music: Training or Design?
When convolutional neural networks are used to tackle learning problems based
on music or, more generally, time series data, raw one-dimensional data are
commonly pre-processed to obtain spectrogram or mel-spectrogram coefficients,
which are then used as input to the actual neural network. In this
contribution, we investigate, both theoretically and experimentally, the
influence of this pre-processing step on the network's performance and pose the
question, whether replacing it by applying adaptive or learned filters directly
to the raw data, can improve learning success. The theoretical results show
that approximately reproducing mel-spectrogram coefficients by applying
adaptive filters and subsequent time-averaging is in principle possible. We
also conducted extensive experimental work on the task of singing voice
detection in music. The results of these experiments show that for
classification based on Convolutional Neural Networks the features obtained
from adaptive filter banks followed by time-averaging perform better than the
canonical Fourier-transform-based mel-spectrogram coefficients. Alternative
adaptive approaches with center frequencies or time-averaging lengths learned
from training data perform equally well.Comment: Completely revised version; 21 pages, 4 figure
An investigation of likelihood normalization for robust ASR
International audienceNoise-robust automatic speech recognition (ASR) systems rely on feature and/or model compensation. Existing compensation techniques typically operate on the features or on the parameters of the acoustic models themselves. By contrast, a number of normalization techniques have been defined in the field of speaker verification that operate on the resulting log-likelihood scores. In this paper, we provide a theoretical motivation for likelihood normalization due to the so-called "hubness" phenomenon and we evaluate the benefit of several normalization techniques on ASR accuracy for the 2nd CHiME Challenge task. We show that symmetric normalization (S-norm) reduces the relative error rate by 43% alone and by 10% after feature and model compensation
A Hybrid Approach to Music Playlist Continuation Based on Playlist-Song Membership
Automated music playlist continuation is a common task of music recommender
systems, that generally consists in providing a fitting extension to a given
playlist. Collaborative filtering models, that extract abstract patterns from
curated music playlists, tend to provide better playlist continuations than
content-based approaches. However, pure collaborative filtering models have at
least one of the following limitations: (1) they can only extend playlists
profiled at training time; (2) they misrepresent songs that occur in very few
playlists. We introduce a novel hybrid playlist continuation model based on
what we name "playlist-song membership", that is, whether a given playlist and
a given song fit together. The proposed model regards any playlist-song pair
exclusively in terms of feature vectors. In light of this information, and
after having been trained on a collection of labeled playlist-song pairs, the
proposed model decides whether a playlist-song pair fits together or not.
Experimental results on two datasets of curated music playlists show that the
proposed playlist continuation model compares to a state-of-the-art
collaborative filtering model in the ideal situation of extending playlists
profiled at training time and where songs occurred frequently in training
playlists. In contrast to the collaborative filtering model, and as a result of
its general understanding of the playlist-song pairs in terms of feature
vectors, the proposed model is additionally able to (1) extend non-profiled
playlists and (2) recommend songs that occurred seldom or never in
training~playlists
- …